Movie ratings analysis is a process of examining and summarizing the ratings given to movies by audiences or critics. This analysis can provide valuable insights into what type of movies are popular, how well received they are, and what factors influence their ratings.The results of a movie ratings analysis can provide insight into what drives the success of a movie, what audiences are looking for in a movie, and what elements are most likely to generate positive ratings. By using this information, movie producers and marketers can create strategies to improve their chances of success in the industry.
##Now let’s get started with the task of movie rating analysis by importing the necessary Python libraries and the datasets:
import numpy as np
import pandas as pd
movies = pd.read_csv("F:/Swapnil/portfolio/projects portfolio/Movie Rating Analysis/movies.dat", delimiter='::')
print(movies.head())
0000008 Edison Kinetoscopic Record of a Sneeze (1894) \
0 10 La sortie des usines Lumière (1895)
1 12 The Arrival of a Train (1896)
2 25 The Oxford and Cambridge University Boat Race ...
3 91 Le manoir du diable (1896)
4 131 Une nuit terrible (1896)
Documentary|Short
0 Documentary|Short
1 Documentary|Short
2 NaN
3 Short|Horror
4 Short|Comedy|Horror
C:\Users\lenovo\AppData\Local\Temp\ipykernel_15788\1159868957.py:4: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
##let’s define the column names:
movies.columns = ["ID", "Title", "Genre"]
print(movies.head())
ID Title Genre 0 10 La sortie des usines Lumière (1895) Documentary|Short 1 12 The Arrival of a Train (1896) Documentary|Short 2 25 The Oxford and Cambridge University Boat Race ... NaN 3 91 Le manoir du diable (1896) Short|Horror 4 131 Une nuit terrible (1896) Short|Comedy|Horror
###Now let’s import the ratings dataset:
ratings = pd.read_csv("F:/Swapnil/portfolio/projects portfolio/Movie Rating Analysis/ratings.dat", delimiter='::')
print(ratings.head())
C:\Users\lenovo\AppData\Local\Temp\ipykernel_15788\2500115637.py:2: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
1 0114508 8 1381006850 0 2 499549 9 1376753198 1 2 1305591 8 1376742507 2 2 1428538 1 1371307089 3 3 75314 1 1595468524 4 3 102926 9 1590148016
##let’s define the column names of this data also:
ratings.columns = ["User", "ID", "Ratings", "Timestamp"]
print(ratings.head())
User ID Ratings Timestamp 0 2 499549 9 1376753198 1 2 1305591 8 1376742507 2 2 1428538 1 1371307089 3 3 75314 1 1595468524 4 3 102926 9 1590148016
##Now I am going to merge these two datasets into one, these two datasets have a common column as ID, which contains movie ID, so we can use this column as the common column to merge the two datasets:
data = pd.merge(movies, ratings, on=["ID", "ID"])
print(data.head())
ID Title Genre \
0 10 La sortie des usines Lumière (1895) Documentary|Short
1 12 The Arrival of a Train (1896) Documentary|Short
2 25 The Oxford and Cambridge University Boat Race ... NaN
3 91 Le manoir du diable (1896) Short|Horror
4 91 Le manoir du diable (1896) Short|Horror
User Ratings Timestamp
0 70577 10 1412878553
1 69535 10 1439248579
2 37628 8 1488189899
3 5814 6 1385233195
4 37239 5 1532347349
## Let's have a look at the distribution of the ratings of all the movies given by the viewers:
import plotly.offline as pyo
import plotly.graph_objs as go
# Set notebook mode to work in offline
pyo.init_notebook_mode()
ratings = data["Ratings"].value_counts()
numbers = ratings.index
quantity = ratings.values
import plotly.express as px
fig = px.pie(data, values=quantity, names=numbers)
fig.show()
So, according to the pie chart above, most movies are rated 8 by users. From the above figure, it can be said that most of the movies are rated positively.
As 10 is the highest rating a viewer can give, let’s take a look at the top 10 movies that got 10 ratings by viewers:
data2 = data.query("Ratings == 10")
print(data2["Title"].value_counts().head(10))
Joker (2019) 1479 Interstellar (2014) 1386 1917 (2019) 820 Avengers: Endgame (2019) 812 The Shawshank Redemption (1994) 707 Gravity (2013) 653 The Wolf of Wall Street (2013) 581 Hacksaw Ridge (2016) 570 Avengers: Infinity War (2018) 535 La La Land (2016) 510 Name: Title, dtype: int64
So, according to this dataset, Joker (2019) got the highest number of 10 ratings from viewers.